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PROCESS FOR MAKING GENES ENCODING RANDOM POLYMERS OF AMINO ACIDS 



Background of the Invention 
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Cooolymer 1 (COP-1) is a synthetic polypeptide analog of myelin basic protein (MBP). which is a 
natural component of the myelin sheath. It has been suggested as a potential therapeutic agent for multiple 
sSif^r. J. ,mmunoLt197lj 1:24* and J. Neurol. 3cL [1977] 31:433^ Interest j - COM as an 
immunotherapy for multiple sclerosis stems from observations first made in the 950 a that , myelin 
components such as MBP prevent or arrest experimental autoimmune encephalomyelitis (EAE). EAE .s a 
disease resembling multiple sclerosis that can be induced in susceptible animals. /nahrt „„ t 
COP-1 was developed by Drs. Sela, Amon. and their co-workers at the We.zmann Institute (Retovot 
Israel). It was shown to suppress experimental allergic encephalomyelitis (EAE) (Eur. J Immunol. (1971] 
1242-448: U.S. Patent No. 3.849.S50). More recently. COP-1 was shown to be beneficial for patents with 
ie exacerbating-remitting form of multiple sclerosis (N. Eng.. J. Med. [1987] ™7^1*££££%Z 
with daily injections of COP-1 had fewer exacerbations and smaller increases in their disability status than 

COP-1 te^ rnbrture of polypeptides composed of alanine, glutamic acid, lysine, and tyrosine In a molar 
ratio of approximately 6:2:5:1. respectively. It is synthesized by chemically polymerizing the four amino 
acids forming products with average molecular weights of 23.000 daltons. Although the nau tang polypep- 

„ Sdes are comprised of the same amino acid components, they differ with respect to their amino acid 
sequences. In fact, there are 10<°° possible ways to assemble a 23.000 dalton polypeptide composed of 
alanine, glutamic acid, lysine and tyrosine in the designated ratios. Purificaton of one or even a small 
number of distinct COP-1 polypeptides from chemically-synthesized COP-1 is not P 0SSI ^- 

Studies evaluating COP-Vs efficacy have been hindered somewhat by inconsistent batches of COP-1. 

a5 Also it is not known which of the amino acid polymer(s) is responsible for the biological activity of COP-1 
Sher random sequence amino aod copolymers related to COP-1 have been '7^^ 
tested for the ability to suppress experimental allergic encephalomyeims (Eur. J. Immunol. [1973] 3.273. 
Immunochemistry [1976] 13:333). Biological activity was observed in EAE assays using COP-1 related 
polymers in which one of the following changes occurs: tyrosine is replaced by tryptophan; or glutamic acid 

30 is replaced by aspartic acid; or tyrosine is excluded. -„„-:«-. 
We have developed procedures for synthesizing genes encoding polypeptides composed of specific 
amino acids, but having random amino acid sequences. The amino acid composition of the po ypepbdes is 
SdTy the set of codons incorporated in the synthetic genes. Likewise, the size of the polypeptides ,s 
controlled by synthesizing genes of specific lengths. 
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Brief Summary of the Invention 

The subject invention concerns" a method for synthesizing genes which encode random polymers ^of 
amino acids A further aspect of the invention is the identification of certain polypeptides which are 
pressed by *e synthetic genes and which have high levels of biological activity. The general method- 
oloay of the subject invention is outlined In Figure 1. 

A critical step in the novel process of the subject Invention is the polymenzation of small 
oiigonucSe duplexes. Preferably, the oligonucleotide duplexes consist of a multiple of three ! n"cieotd«s. 
The lenom of the synthetic genes can be controlled through the use of adaptors specific for the S and 3 
enSs of toe ToHgoSSeotides. Further, the composition of the resultant polypeptides can be van ed with 
respecTto the I llative proportions of the amino acid constituents, by varying the proportions of input 

0ti9 °Zw* e e£ 9 « Peptides similar to COP-1 exemplifies the procedures of the subject invention The 
initiaTste^e ^procedure is the synthesis of genes which code for polypeptides "listing ofpredeter- 
mmed amfco add constituents. These genes ar then cloned in an expr ssion vector and introduced Into , t 
con sucTthat each r combinant bacterial colony contains on COP-1 gene. To generate a m.xtu e of COP-1 
plypep«des (analogous to the chemically synthesized product) we produce COP-1 polypeptides from a 
pool of recombinant bacterial colonies containing COP-1 gene sequences, e.g.. 1000 colon, s. 
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The efficacy of the pool of recombinant COM polypeptides is tested in experimental allergic 
h-hmvoMta (EAB assays If effective, the pool of colonies exhibiting activity is further subdivided 
SrSST^^SLSTi *e poiypeptideTfrom these sma.ier pools are tested By ****** 
S^Snami selecting the most active pools, we identify individual recombinant COP-1 polypeptides or 
groups o? po^eptides with biological activity in EAE assays equal to or higher than chemicai.y 
sSes^OP-1 Se opportunity to characterize homogeneous, individual COP-1 polypepfdes ,s un.que 

t ° tt T^e 43 sSeS invention also concerns the synthetic genes and the polypeptides produced by the 
m JLs Sdosed herein Advantageously, the procedures of the subject invention can be used to produce 
ootoSidS whS, rZ be useful in preventing, arresting, or controlling demyelinating disorders such as 
mSf scTeros fr preferred copolymer according to the subject invention consists substantially of 
Se 2d eiLr glutamic or aspartjc acid. Polymers of any length or molecular weight can be 
desired usin^ the procedures of the subject invention. Another preferred copolymer further includes 
synthesized using «™»P soecificallv. a preferred copolymer may consist of alan.ne. lysine. 

SZSZZZ VSS^JSL jSSSi weight between'abL 5.000 and 50,000 dalton, Further, 
the method of the subject invention can also be used to make fusion proteins. 



Brief Description of the Drawings 



Figure 1 depicts the general methodology for synthesizing random genes and identifying polypep- 
tides wfth spedfic ^ for synmesi2ing genes encoding random . seque nce po.ypep- 

fideS ' Figure 3 shows all possible 3-amino acid combinations and their percent occurrence in COP-1. 
Figure 4 shows 9-nucleotide duplexes and adaptors for COP-1 gene synthesis. 
Fiaure 5 shows the construction and cloning of synthetic COP-1 genes. hatwMn 
F.gure 6 provides the sequence analysis of one synthetic gene revealing proper junctions between 

dUP ' eX Rgure 7 is a Western blot showing the fusion protein produced by four different clones. 

Figure 8 shows variations of random-gene synthesis using small duplexes. nnlumnrs 
Fiaure 9 shows synthesis of single-stranded DNA encoding random sequence ammo acid P^ers. 
F^ure 10 S r^w7r P ^osphoramidite trinucleotide for random gene synthesis us,ng a ONA syn- 

thesizer. . « ^no 1 r? 

Figure 11 shows the DNA and amino acid sequences of rCOP-1-77. 

Fiaure 12 shows the DNA and amino acids sequences of rCOP-1-19. . . . nr 

Figure 13 shows the average EAE scores of disease-induced guinea pigs wh.ch were untreated or 
treated with myelin basic protein. rCOP-1-19. or rCOP-1-77. 

Detailed Description of the invention 

assembling small ellgonucleotw. duplexes because more *W>" ~J»" 'JfT^^ „ a mp^ed 
order ol small rath r than large duplexes. In c ? n * r **j\ 'V^TZ^Z ^osSuSt^m-s NU .nc 8 
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To achieve this ordering, the sticky ends at each Junction must be unique. Thus, the process of the subject 
invention is unique in that random-sequence genes are synthesized using oligonucleotide duplexes 
encoding small segments of amino acids, and the sticky ends on each duplex are the same. 

Our procedure for synthesizing genes encoding recombinant COP-1 polypeptides entails using 

s oliqonucleotide duplexes encoding segments of 3 amino acids. All possible permutations of the 3 1 amino 
acid segments comprised of the four COP-1 amino acids and their percent occurrences in COP-1 are 
shown in Figure 3. To make recombinant COP-1 genes we have synthesized oligonucleotides correspond- 
ina to the coding and noncoding strands for some of the 3 amino acid segments (Figure 4). The 
oligonucleotides are phosphorylated at the 5 ends. Complementary pairs of oligonucleotides are annealed. 

,o forming duplexes with the 3 nucleotide extending on each strand: adenosine on the coding strand, and 
thymidine on the noncoding strand. Adenosine and thymidine base pair with one another thus ensurmg that 
the duplexes are joined directionaliy. that is. coding strands to other coding strands. S.nce the dup exes 
have the same nucleotide extensions, they can align in any order. When duplexes correspond.ng to all 
s^-foLr 3-amino add blocks are mixed and Hgated. COP-1 genes with all possible sequences are 

,5 P rod ^ 8 ^ synthetic genes, we have included adaptors specific for the 5* and 3' ends 

One s\rand of each duplex adaptor is not phosphorylated (see Figure 4). As a result, ligation products 
terminate at the hydroxylated ends. By varying the ratio of the adaptor duplexes in the reaction, we can 
control the length of the synthetic genes. The adaptors serve the second function of addmg specific 
exTsioS f required to directionaliy clone the ligation products into the ^ J^VSLSX 
proportions of amino acid constituents in the random polymers can also be modulated by mixing the 
Sgonudeotide duplexes In different ratios. For example, duplexes can be added to Hgabons in proportions 
dictated by the percent occurrence of the corresponding amino acid segments in COP-1 (Figures). 

The number of different duplexes incorporated determines the sequence complexity of the synthebc 
genes. For example. COP-1 genes can be constructed using fewer than 64 duplexes but not «^™» 
combinations will occur. The amount of sequence complexity required will depend on the application of the 

POiV OM complication arises In producing completely random COP-1 amino acid sequences. ^uPtexes 
to have the same extensions, the 3' nucleotide on the coding strands must be tte ^^""J* 
Sanine glutamic acid, and lysine can end with adenosine; however, for tyrosine, the last nucleotide .s either 
or undine. Thus, duplexes encoding three amino acid segments ending in tyrosine (fourth column 
onV.gure 3) will have different extensions than the other duplexes. This limitation can be overcame by 
making a second noncoding strand for each duplex with an extension of guanosine or adenosine Because 
tMs solution requires significantly more DNA synthesis, we have elected to exclude duplexes corresponding 
35 to three amino acid segments ending in tyrosine. ^ fcm( .„H*i 
Once the synthetic genes are made, they are cloned into a suitable expression vector ^ fransferred I to 
a host capable of expressing the polypeptides. The host may be bacterial or eukaryotc cells. With bacten* 
cells are grown under conditions that permit formation of recombinant colonies. Each colony wri contain 
and express one synthetic gene. The polypeptides expressed by culture frornspecif.c colonies *e Isolated 
40 and tested for the relevant biological or immunological activity. For example. COP-1 polypeptides are tested 
for the ability to suppress EAE. , iL , , 

To screen a large number of polypeptides for activity, it may be advantageous to pool the polypeptides 
before testing Pools of polypeptides can be generated by either combining colonies and isolating the 
XepS " Proved by *? mixed culture or by cuituring individual colonies, isolating the 
« from each cutture. and then combining the purified polypeptides. By testing pools of Polypeptides it is 
DoSible to more quickly determine which of the colonies express biologically or .immunologically active 
DowVeotides ?or example, if polypeptides from 100 colonies do not exhibit activity, these colonies ; can be 
KKeffrcm M investigaSn If. on the other hand, the mbcture of polypeptides does exhibit the 
dSd acS. *en sequenL subsets of the pooled polypeptides can be tested until the active 

50 poiypep^es direc «y on the coionies. thus eliminating the 

ne JfoHsoIatinTpd^tides' For example, reactivity of polypeptides with antisera can be assessed in 
colonies grown and lysed on nitrocellulose filters. 

55 

Materials and Methods 
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Synthesis and Phosphorylation of Oligonucleotides. The coding and noncoding ohgonudeoMe i for 
eadTdSeTariTy nthesized by the ph osphite triester m ethod with an Applied Biosystems Mode 380 DMA 
synthesizer. The 5' ends of oligonucleotides are phosphorylated on the DNA synthesizer using (2-{2-<4.4 
dimethoxytritytexy)ethylsulfonyl)ethyK2Hyanoethyl)N.rWiisopropyl) phosphoramidite from Glen Research. 

He ^°riflcation of Oli gonucleotides. The phosphorylated form of the oligonucleotide is separated from the 
cnidTTSxtuTTbfelectrophoresis through a 20% acrylamide gel containing 7 M urea. Ol.gomers are eluted 
eS?gi slices and desaKed on Sep-Pak C18 cartridges. Separation of the 5 Phosphorylated 
oligomer from hydroxylated forms is criticaJ because hydroxylated oligomer causes the Hgation reactions to 
terminate prematurely, yielding very small products. 

Annealing and Ligation of Oligonucleotides . Mixtures contain.ng 1 nmol each of a coding and com- 
plemS^o^dinf-Str^d. 1 00 mM Tris-H CI. pH 7.6. and 0.! mM ethylenediam.netetraacetic add 
S>TA)1r7sS microliters are heated to 80* C in a 600 ml water bath. The reaction* ^e aliowec^ to coo. 
s7owlv f2 hours) to room temperature. The temperature is decreased to 4 C over one hour by addmg ce to 
Se water ba"h Ration, equa. aliquots of the annealed duplexes are mixed and the so.utJon ,s ad,usted 
T^Sn 10 mM M?CI 2 . mM adenosine triphosphate (ATP). 1 mM DTT. and 15% polyethytene glycol. 
T^e f?nTconcTni oi Tris-HC. is 66 mM (from the annealing reactions). The total concen^on of 9 
m S r duolexes is 10 pmol/microliter. Annealed adaptors are included in the reactions at a concentration 

SSoK r?&/c^"S^ « -centred by ethano, precipitation and 
^zTtele^on oi'ugSon Products. The concentrated reaction products are e.ecfrophoresed on 4% 

Snt o? a'ppTximate.y this size because COM poiypeptides within this range were prevousiy tested in . 
chemical trials. The agarose plug containing the synthetic DNA is stored at -20 C. 

Preparation of Expression Vector 

_ pn; 2 1 olasmid can be constructed from a plasmid pBQI. Plasmid pBQ1 can be isolated from 

between pBG1 and pREV2.2 are the following: 
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standard procedures. ^ t 

1c. The product plasmid, pBGl N, where the 2160 base pair Nde l fragment is d leted from pSGi. 
was selected by preparing plasmid from ampiciliin resistant clones and determining the restriction digestion 
patterns with Ndel and Sail (product fragments approximately 1790 and 1650). This deletion inactivates the 
rop gene that controls plasmid replication. 

— 2a. 5 ug of pBG1 N was then digested with Eco RI and Bell and the larger fragment approximately 
2455 base pairs, was isolated. 

2b. A synthetic double stranded fragment was prepared by the procedure of Itakura et aJ. (Itakura, K., 
J. J. Rossi, and R.B. Wallace [1984] Ann. Rev. Biochem. 53:323-356. and references therein) with the 
following structure: 

5' GATCAAGCTTCTGCAGTCGACGCATG 

3' TTCGAAGACGTCAGCTGCGTACGCCT 

AGGCCATGGG CCCTCG AGCTTAA 5' 



CGGATCCGGTACCCGGGAGCTCG 3' 



This fragment has Bcl^l and EcoRI sticky ends and contains recognition sequences for several restriction 

endonucleases. L . . 

2c 0.1 u.g of the 2455 base pair EcoRI-Bcl I fragment and 0.01 ug of the synthetic fragment were 
joined with T4 DNA Hgase and competent cells of strain JM103 were transformed. Ceils harboring the 
recombinant plasmid. where the synthetic fragment was inserted into pBG1 N between the BclJ and EcoRI 
sites, were selected by digestion of the plasmid with Hpal and EcoRI. The diagnostic fragment sizes are 
approximately 2355 and 200 base pairs. This plasmid is called pREV1. ^ 

2d. 5 ug of pREV1 were digested with Aatll. which cleaves uniquely. 

2e. The following double-stranded frag ment was s ynthesized: 
5 CGGTACCAGCCCGCCTAATGAGCGGGCl I I I I I I IGACGT 3 # 
3' TGCAGCCATGGTCGGGCGGATTACTCGCCCGAAAAAAAAC 5 

This fragment has Aatll sticky ends and contains the trpA transcription termination sequence. 

2f. 0.1 u.g ofAatll digested pREV1 was ligated with 0.01 ug of the synthetic fragment in a volume of 

20 ul using T4 DNAlgase. _ 

2g. Cells of strain JM103, made competent, were transformed and ampiciliin resistant clones 

selected. • 

2ri Using a Kpnl. EcoRI double restriction digest of plasmid isolated from selected colonies, a cell 
containing the correSTco'Hitruction was isolated. The si2es of the Kpnl, Eco RI generated fragments are 
approximately 2475 and 80 base pairs. This plasmid is called pREVITT and contains the trpA transcnption 

terminator.^ ^ ^ prevUT, prepared as disclosed above (by standard methods) was cleaved with Ndel 
and Xmnl and the approximately 850~base pair fragment was Isolated. 

~3b 5 ug of plasmid pBR325 <BRL, Galthersburg, MD). which contains the genes conferring 
resistance to chloramphenicol as well as to ampiciliin and tetracycline, was cleaved with . Bel I and the ends 
blunted with Klenow polymerase and dexoynucleotides. After inactivating the enzyme, me mixture was 
treated with Ndel and the approximately 3185 base pair fragment was isolated. This fragment contains the 
genes for chioTimphenicol and ampiciliin resistance and the origin of ^Plication • 

3c 0 lug of the Ndel-Xmnl fragment from pREVITT and the Ndel-Bcll fragment from pBR325 were 
ligated in 20 ul with T4DNA~iigase and the mixture used to transform competent cells of strain JM103. 
Cells resistant to both ampiciliin and chloramphenicol were selected. 

3d Using an EcoRI and Ndel double digest of plasmid from selected clones, a plasmid was setected 
giving fragment sIzeFof approximately 2480. 1145. and 410 base pairs. This is called plasm.d pREV1TT/chl 
and has genes for resistance to both ampiciliin and chloramphenicol. 

4a. The following double-stranded fragment was synthesized: . 
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Mlul EcoRV Clal BamHI 

5' CGAACGCGTGGCCGATATCATCGATGG 
3' GCTTGCGCACCGGCTATAGTAGCTACC 
Sail HindUI Smal 
ATCCGTCG ACAAGCTTCCCG G GAG CT 3' 
1Q TAGGCAGCTGTTCGAAGGGCCC 5' 

This fragment with a blunt end and an SsU sticky end. contains recognition sequences for several 
restriction enzyme "^ iTlfch| wjjs deaved ^ Nm , (wnlcn cleaves 20 nucleotides from the Bc1_l 

15 site) and 'sstl (which cleaves within the multiple cloning site). The larger fragment, approximately 3990 base 
pairs. ^^Z^^l!^ from P REV1TT/ch. and 0.01 ug of the synthetic fragment were 
treated with T4 DNA ligase in a volume of 20 ul. ... - 

4d. This mixture was transformed into strain JM103 and amplcillin res.stant clones were elected. 

20 4e. Plasmid was purified from several clones and screened by d.gestton witt, MM or Oat 

Recombinant clones with the new multiple cloning site will give one fragment when d.gested w.th either of 

thA<?fl pnzvmes because each cleaves the plasmid once. w 

these enzymes be ^ ^ ^ ^ ^ ^ done by restnct^ng me P lasmW 

with Hpal and Pvull and isolating the 1395 base pair fragment, cloning it into the Smal arte of mp18 and 
26 sequencing it by~d!deoxynuc!eotide sequencing using standard methods. 

"4g. This plasmid is called pREV2^. nl „ mM w „ 

Plasmid pREV 2.2 can be isolated from its E. coli host by well known procedures. Th.s ptasm.d was 
depoiS in thV E con host JM103 with ^Northern Regionai Research Laboratory (NRRU US 
30 department of Agric-ultUr?. Peoria. Illinois. USA) on July 20. 1986 and was assigned^ access.on number 

HasmiTpREV 2.1 was constructed using plasmid pREV 2.2 and a synthetic oligonucleotide. An 

eXamP l e -Son enzymes Nru. and BamHI and the 4 Kb fragment is 

isolated from an agarose gel. 

2. The following double-strand oligonucleotide is synthesized: 

5' CGAACGCGTGGTCCGATATCATCGATG 3' 
3' GCTTGCGCACCAGGCTATAGTAGCTACCTAG 5' 
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3. The fragments from 1 and 2 are ligated in 20 ul using T4 DNA ligase. transformed into competent 
E. coli cells and chloramphenicol resistant colonies are isolated. • 
- —4. Plasmid clones are identified that contain the oligonucleotide from 2. spanning the regrcn from the 
Nrul site to the BamHI site and recreating these two restriction sites. Th.s plasmid isr termed pREV _2£ 
— Clonino theTEation Products into an Expression Vector. The express.on vector .s digested to yield 
exteSrth^aTSp^^ * *e synthetic genes. The agarose plug 

L microliters T4 DNA ligase (400 units) is added and the reaction is Incubated overnight at 16 C. 
C° omP e^ *° <Uon mixtures and recombinants are identified by se.ect.on on 

55 invent ?Ses7examples shoutd not be construed as limiting. Ail percentages ere by weight and al. 
solvent mixture proportions ar by volume unless otherwise noted... 
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Example 1 - Strategy for Synthesis of Random Sequence Genes 

Model studies using several oligonucleotide duplexes were performed to assess this method for 
synthesizing genes encoding polypeptides of predetermined amino acid composition but random se- 
quences. The synthetic genes were analyzed with respect to size, ligation junctions, composition, sequence, 
and levels of expression. 

Size. Synthetic genes within broad size ranges are produced by varying the ratio of adaptors to 9-mer 
duplexes. We are able to select genes within more limited size ranges by resolving the ligation products on 
agarose gels and excising gel slices containing products of a certain length. Using these procedures, genes 
in the following three size ranges have been isolated: 75-150. 280-320, and 400-600 nucleotides. 

Ligation jun ctions . The synthetic genes have been sequenced to demonstrate that the 9-mer duplexes 
are joined end to end and without insertion or deletion of any nucleotides. Correct junctions are necessary 
for maintaining the reading frame and thus producing genes that encode polypeptides of the expected 
amino acid composition. Sequence analysis revealed that the junctions between the duplexes are correct 
75 (Figure 6). 

Composition. To control the amino acid composition of the encoded polypeptides, the synthetic genes 
must contain the duplexes added to the ligation reactions. The results from three gene synthesis 
experiments demonstrated that the synthesized genes are composed of the duplexes included in the input 
steps of the synthesis. 

Sequence. Since each duplex can ligate to any other, the order of the 9-mer duplexes in the synthetic 
genes should be random. This randomness is demonstrated in the Example shown in Figure 6. The 
synthetic gene shown in this example is composed of duplexes encoding the following amino acid 
segments: KKA, EAE, KAK and YKK. 

Expression. To measure expression levels, synthetic genes are cloned into vectors such that tne 
polypeptides are expressed either as fusions with heterologous vector-derived peptide sequences, or as 
nonfusion polypeptides. The levels of expression of the fusion products can be readily measured by 
Western blot analysis using antisera directed against the vector-derived portion of the fusion protein. Figure 
7 demonstrates the expression of four CO P-1 -containing fusion polypeptides. 
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Example 2 - Variations for Synthesizing Random Sequence Genes with Small DNA Duplexes 



DNA duplexes of other lengths can be used in an approach similar to that described for duplexes of 9 
nucleotides. Since three nucleotides code for one amino acid, strands of 3, 6, 12, 15, or1 8 nucleotides can 
be annealed forming duplexes that code for small blocks of amino acids (see Figure 8). More sequence 
variation occurs by mixing small rather than large duplexes. 

In addition, terminal extensions of more than one nucleotide can be employed. Figure 8 shows an 
example using duplexes of 12 nucleotides and extensions of 3 nucleotides. XXX represents any codon; the 
codons in the duplexes are varied to produce polypeptides of the desired amino acid composition. This 
40 approach restricts the polypeptide sequences since the amino acid encoded by the duplex junctions ~ 
alanine (Ala) in this example - Is repeated every fourth amino acid. AIso.;n another variation of the subject 
invention, extensions of 5 nucleotides can be employed instead of the 3 extensions illustrated throughout 
this report 



Example 3 - Synthesis of Single-Stranded Random Sequence Genes 

Gene synthesis using DNA duplexes results in double stranded genes that are ready for cloning into an 
expression vector. An alternative strategy entails producing single-stranded, random sequence genes. The 
application of this method to COP-1 is illustrated in Figure 9. Three-nucleotide oligomers corresponding to 
codons for each of the amino acids in COP-1 are synthesized, mixed in appropriate ratios, and chemically 
polymerized in solution to produce long single-stranded COP-1 genes. The complementary strand of DNA 
is made enzymatically using reverse transcriptase or DNA polymerase. The double-stranded DNA is 
prepared for cloning by digestion and repairing the ends of the molecules. 

Single-stranded, random sequence genes could also be made by performing the polymerization step on 
a DNA synthesizer. Typically, synthetic DNA is assembled one nucleotide at a time using phosphoramidite 
nucleotide precursors. We developed a strategy for synthesizing single-stranded DNA in three nucleotide 
segments ("codons") by using phosphoramidite trinucleotides (Figure 10). The us of 3-nucIeotide building 
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blocks instead of single nucleotides Is necessary to ensure that only specific codons. those corresponding 
to the phosphoramidlte trinucleotides, would occur In the synthetic genes. To test this strategy, we 
commissioned the custom synthesis of a phosphoramidite trinucleotide. We observed polymerization of the 
trinucleotide on the DNA synthesizer. 

Example 4 • Expression of Bandom Sequence Genes in Vector Producing Fusion Proteins 

Synthetic random sequence genes can be cloned into gene fusion vectors so that the expressed 
oolvpeptides are comprised of a vector-derived polypeptide linked to the random sequence P^W**- 
jSTSXltoi of this method to COM is described here. Synthetic COM genes are Coned into the 
exoression vector denoted pREV 2.1 within the poiylinker site. Upon expression from pRev 21. a 
polypeptide is synthesized comprising the amino-terminal portion (approximately 25 to approx.mately 45 
amino acids) of the bacterial protein linked by a peptide bond to the COP-1 polypeptide. 

We have te2ed twelve different COM genes for expression in pRev 2.1. Fusion 
found to be expressed from ten of the twelve constructs. Figure 7 is a Western blot demonstratng the 
£slon oroteins produced from four different clones, as detected by binding of anttsera specfic for the 
c^teriaClnS fesion protein. Detection of fusion proteins in the expected size range using antisera 
sr!Sc ta £e vector-derived portion is dependent on the presence of a COM gene sequence since the 

orotSn (approximately 3.900 daltons) and approximately 130-200 amino acids encoded by the random 
seqrnc7ge~es (15.000-23.000 daltons). Based on migration through SDS gels, the mo ecular weights o 
Z ? SnTor/pUdes are in the correct range. Cone 4 produces lower levels of severe, smaller 
oolvnaDtides which may be generated upon degradation of the largest species. 

P XT ammo addsVhich are at the Junction of the vector-derived and COP-1 polypeptides are encoded 
by the 5 ? oligonucleotide adaptor duplex. The adaptor duplex can be designed "^V^^ 
residue between the bacterial protein and the COP-1 sequences. In this case, the COM polypeptides can 
b rSLSSTSn * i Lion protein by treatment with cyanogen bromide which cleaves on the carboxyl 
Z^7!^T m e^e residues. Both forms of COM (the fusions and the free polypeptides) are 

tB *t SJTTpS!?!-. other fusion vectors can be employed to express COM . fusior 
and othlr strategies can be employed to re.ease the COP-1 polypeptides ^^^S^Si 
example, to improve the expression of rCOM polypeptides in E. cob. genes coding for ""J 
rCOM-i* were subc.oned from pREV 2.1 to P BQ3-2AN. a plasmid used to express Protein A. PB63-2AN 
has been deposited as described in U.S. Patent No. 4.691.009. The deposit was made , on November _2a 
1^4 and given the accession number of NRRL B-15910. The rCOP-1 genes were .solated from the pREV 
21 recombinant plasmids by digestion with Nco1 and EcoRI. The Nco1 site occurs in the 5 linker 
2£ZZ££JT*m rCOP-1 genes and the"^R1 siteliin pREV 2.1 downstream o the ,rCOM gene. 
A?er digestion with the restriction enzymes. IRe ends of the rCOP-1 genes are blunted wrth deox- 
ySonSeotldes and Klenow fragment. DNA fragments containing »e rCOM genes , «. '^ated from 
aoarose aels pBG3-2AN is digested wtih Nhe1, treated with phosphatase and the ends of the DNA are 
bfuXd w 9 iS deoxyribonucleotides and KlenSwlragment. After ligation and plating. P BG3-2AN recombinants . 
beSngTcOP-1 ^genesT the correct orientation are identified by DNA sequence analysis. The resetting 
pS's encode" fusion proteins consisting of glucuronidase. Protein A. and rCOF - 1 "J-f^ 
methionine residue occurs between the Protein A and rCOM sequences, orig.nat.ng from the 5 taker 
-der that the COP-1 polypeptide may be cieaved from ^J^J^^S^S^- 
amino acid sequences for rCOM-77 and rCOP-1-19 are shown in Figures 11 ™ d } Z -™ P *J^^°^ 
19 contains oligonucleotide duplexes encoding the following amino acd segments: YKK AAE. «^ 

YeT AK? KEA and KAA. rCOM-77 contains oligonucleotide duplexes encoding the following ammo 
^e^e^Y^ Se. AAK. and AAA, The N-termina. alanine residue .n each sequence ,s .eft 
behind following CNBr cleavage of the fusion protein. ^ . 

The invention should not be limited to the examples descnbed above. 

Example 5 - Expression of Random Sequence Genes in Non-Fusion Vectors 

Synthetic genes can be cloned into expression vectors such that the polypeptide products are not fused 



s 
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to vector-derived protein sequences. As In fusion vectors, these non-fusion vectors contain the appropnate 
transcriptional and translational signals; however, the synthetic genes are linked directly adjacent to the 
translation initiation signal such that they contain a methionine residue as the amlno-terrnlnal amino acid. 
This single methionine can be removed from COP-1 polypeptides by cyanogen bromide cleavage. COP-1 
polypeptides with and without this amino-terminal methionine are tested for biological activity. 



Example 6 - Purification of rCOP-1 Polypeptides 

The purification of rCOP-1 polypeptides can be accomplished by a number of methods which are well 
known to those skilled in this art. For example, E. coH cells expressing Protein A/rCOP-1 lusion proteins are 
arown in a fermenter, collected by centrifugation. and lysed using a dynamill. The extract is centnfuged to 
remove debris adjusted to contain 8 M urea, and chromatographed on an S-Sepharose column using a 
sodium chloride gradient for elution. Fractions containing the fusion protein are dialyzed against a solution 
of qlycerol in phosphate buffered saline: the dialysate is centrifuged to remove contaminating proteins that 
precipitate during dialysis. The Protein A/rCOP-1 fusion protein is cleaved with cyanogen bromide and the 
rCOP-1 polypeptide is purified by gel filtration and reverse phase HPLC. 



Example 7 - EAE Experiments 

rCOP-1 has been tested for efficacy in suppressing experimental allergic encephalomyelitis (EAE). As 
described above, EAE is a T-cell mediated autoimmune disease that is employed as a model for the human 
disease multiple sclerosis. EAE experiments are performed essentially as described by Swanborg 
(Swanborg RH [1988] -Experimental Allergic Encephalomyelitis," In Methods in Enzymology, vol. 162. p. 
413 Academic Press. Inc.). For example, the disease Is Induced in Hartley guinea pigs by a single, 
subcutaneous injection of 10 ug of guinea pig myelin basic protein in Freund's adjuvant containing 100 ug 
of Mycobacterium tuber culosis . Onset of disease occurs about 12 to 20 days after induction. The disease is 
sc ored on a scale of 0-4: 0 = no disease; 1 = loss of coordination In hind limbs; 2 ^paralysis of one or 
both hind limbs; 3 = paralysis extending to include one or both front limbs, can include incontinence of 
bladder or bowel; and 4 = extensive paralysis, inability to move. Animals are scored every 2-3 days from 
the onset of disease, and most animals spontaneously recover from the disease. The treatment protocol 
consists of intramuscular injections of 500 mg of test material at 1. 6. and 11 days after induction of 
disease The dosage, route of administration, and schedule for treatments can be varied. Also, eae 
experiments can be performed in other species including rats and mice. Other variations of the expenmen- 
tal protocols and scoring may be used. . 

Two rCOP-1 molecules. rCOP-1-77 and rCOP-1-19. have recently been tested in the EAE expenments. 
The production and purification of these molecules were performed in accordance with the procedures 
described in Examples 4 and 6. Guinea pigs treated with rCOP-1-77 or rCOP-1-19 were compared to 
animals treated with myelin basic protein, a positive control, and to an untreated group. The graph in Figure 
13 shows the average EAE scores for each treatment group versus days after Induction. In Table 1. several 
aspects of disease, such as incidence, day of onset, maximum severity, and duration, are compared. 

Table 1. 



Effects on rCOP-1 and MBP on EAE 


Group Duration 2 


incidence 1 


Day of Onset 2 


Max. ! 


Severity 2 


Untreated 

rCOP-1-19 

r COP- 1-77 

Myelin Basic Protein 


7/7 
8/8 
7/8 
8/8 


13.5 ±1.0 (13-15) 
16.8*3.7(13-25) 

17.6 ±2.8 (15-22) 
18.1 ±2.6 (15-20) 


2.3 ± 0.6 (1-3) 

2.8 ± 1.4 (1-4) 

2.9 ± 1.0 (1-4) 
2.3 ±1.1 (1-4) 


10.8 * 3.8(4-15) 
8.63 £ 5.2 (2-15) 
9.9*4.5 (2-15) 
5.3 ±5.5 (2-15) 



■ incidence — iwmuw ytiui 

2 Values are the mean standard deviation. Ranges are in parentheses, 
a One animal in this group died. .. 
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Th* results can be summarized as follows: rCOP-1-77 and rCOP-M9 both delayed the onset of disease 
SnJTf ^ nt Regimen described here. The rCOP-1 mo.ecu.es did not affect other measures of 
Ss»^dd^. '"aJmum severity, or duration. Myeiin baste protein delayed onset and decreased 
d.sease incioBn . slg nfficantly alter severity or Incidence. One note about th.s particular 

ZZJl isTa?me m Jim severity for the untreated animals (2.3) is unusually low; in a pHot 
SX5 L t guinTp^gs. the severity scores were 2 and 4. It may be oossib.e to op«m,ze the 
effects of rCOP-1 by varying the treatment procedure. 



,o Fvamgle 8 - Other Applications for Random Sequence Polymers of Amino Acids 

25 secuence poiypeptides having a predetermined amino acid composi- 

tion is as supplements for diets deficient in certain amino acids. 

\ 

30 Claims 

*??SSSd*SSSn, to claim 1 a *. I-Oth o( .aid •**•«= 0» -»"™'" d *™>* ** 

^XcTrr r«rr:=^ - — - - 

SiSSS rnS^y^r™ amio'o aoio aa^aaca ««. a*. POh-paP*." * 
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predetermined amino acid constituents, said method comprising the synthesis of genes encoding said 
random polymers, said synthesis comprising the polymerization of small oligonucleotide duplexes, said 
method for making polypeptides further comprising the cloning of said synthetic genes into an expression 
vector and transferring said vector into a bacteria or eukaryotic cell capable of expressing said polypeptide. 
15. A method, according to claim 14, wherein said bacteria is an Escherichia coll . 
16^ A method for making a fusion polypeptide having one portion comprised of all or part of a 
heterologous polypeptide and a second portion comprised of a random sequence wherein said random 
sequence portion has predetermined amino acid constituents, and said method comprises the synthesis of 
genes encoding said random polymers, said synthesis comprising the polymerization of small 
oligonucleotide duplexes, said method further comprising the cloning of said synthetic genes into an 
expression vector adjacent to a DNA sequence encoding all or part of said heterologous polypeptide and 
transferring said vector into a bacteria or eukaryotic cell capable of expressing said fusion polypeptide. 

17. A method for synthesizing and identifying a biologically or Immunologically active polypeptide, or 
mixture of polypeptides, said method comprising the following steps: 

(a) synthesizing genes encoding random polypeptides by polymerization of small oligonucleotide 

dUpleX ^)* cloning each of said synthetic genes into a vector and transferring said vector into a host capable 
of expressing said polypeptides; ,„u:„u 
<c) growing said hosts under conditions which permit the formation of recombinant colonies which 
20 each express one recombinant gene; 

(d) testing the polypeptide, or a mixture of the recombinant polypeptides, combined either before or 
after isolation from said colonies, for evidence of biological and/or immunological activity; 

(e) where activity is observed for the mixture of polypeptides, generating smaller subsets of that 
combination and testing each of these subsets for biological activity; and 

25 (f) repeating step (e) until the active component(s) or a suitable mixture thereof is obtained. 

18. A synthetic gene which codes for rCOP-1-19. 

19. A synthetic gene which codes for rCOP-1-77. 
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Figure 1 

Strategy for Synthesizing Random Genes and 
Identifying Recombinant Polypeptides with Biological Activity 



mixture of 

oligonucleotide 

duplexes 



WVW 



vvvw 



ligale 



VvVW 




VWvV 

N/N/WN/ 



random sequence 
synthetic genes 



WVW 



clone in an expression vector 
transform into host 



\ 



recombinant colonies, 
each contains one 
synthetic gene 



Identification of biologically active polypeptides 



pool colonies (e.g.1000) 
^ isolate polypeptides 

I , 

test lor biological activity 

fractionate into smaller pools of 
colonies (e.g. 1 0 pools f 100 
colonies) 



select individual colonies 
isolate the polypeptide 
lest lor biological activity 
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Oligonucleotides 
(9-mers) 



anneal 



T 

Duplexes 



ligate 



Synthetic 
Genes 



X' 



Figure 2 

Synthesis of Genes Encoding 
Random-Sequence 
Amino Acid Polymers 

_ X • (coding) 

v» -^^-x.-vw (noncoding) 

x f - - 

x-= * 

x x x x x 

x' * 

_x x ~ — ~x — ^. x ™:: x 

x -x' — — x X- 

_oc x X ..v - x 

x * *• 



expression 



Polypeptides 



Ala-Ala-Ala-Lys-Lys-Lys-Ala-Ala-Ala-Lys-Lys-Lys-Ala-Ala-Ala 

Ala-Ala-Ala-Ala-Ala-Ala-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys 

Lys-Lys-Lys-Ala-Ala-Ala-Ala-Ala-Ala-Lys-Lys-Lys-Ala-Ala-Ala 



■x 

■X 



. Ala Ala Ala 
■ LysLysLys 
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Figure 3 



All Possible 3 Amino Acid Combinations 
and Their Percent Occurence in Cop 1 



AAA 7.872 

AEA 2.624 

AN A 6 . 560 

AY A 1.312 

EAA 2.624 

EEA 0.875 

l£KA 2.107 

fiA 0.4i7 

KAA 6.560 

KE A 2.1U7 

KKA 5.466 

KYA. 1.093 

YAA 1.3 12 

YEA 0.437 

YKA 1.093 

*f Y A 0.219 



AAE 


2.624 


AEE 


0.875 


AKE 


2. 187 


AYE 


0 .437 


EAE 


0 .875 


EEE 


0.292 


EKE 


0.72? 


EYE 


0.146 


KAE 


2. 187 


KEE 


0.729 


KKE 


1 . 822 


K YE 


0.364 


YAE 


0.437 


YEE 


0.146 


YKE 


0.364 


YYE 


0.073 



AAK 


6 .560 


AEK 


2 . 187 


AKK 


5. 466 


AYK 


1 .093 


EAK 


2. 187 


EEK 


0 .729 


EKK 


1 .822 


EYK 


0.364 


KAK 


5.466 


KEK 


1 .B22 


KKK 


4 .555 


KYK 


0.911 


YAK 


1 .093 


YEK 


0 .364 


YKK 


0.911 


YYK 


0 . 102 



AAY 1.312 

AEY 0.437 

AKY 1.093 

AYY 0.219 

EAY \ 0 . 437 

EEY 0.146 

EKY 0.364 

EYY 0.073 

KAY 1.093 

KEY 0.364 

KKY 0.911 

KYY 0.1B2 

YAY 0.219 

YEY 0.073 

YKY 0.182 

YYY 0.036 



A=Alanine 
E=GIutamic Acid 
K=Lysine 
Y=Tyrosine 
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Figure k 

9-NucIeotide Duplexes and Adaptors 
for Cop 1 Gene Synthesis 



Examples of 9-Nucleotide Duplexes for Cop 1 Genes: 

© ® 

Coding (Send, IaGAAGGCA GAASCAGAA per*> 

Nonooding (3* end JTTTCTTCCG T C T T C G T C T P end) 

© © 

Amino acids K * M 



Adaptors: 



. « ^ GATC- 
For the 5' end ^ 

(BamH 1) 



3> 



T -j (Sac I) 



For the 3' end *~ AGCT 
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Figure 5 



(Si) 



5* adaptor 



GATC- 

* - 

(BamH I) 



GATC- 
(BamH lj 



Construction and Cloning 
Synthetic COP 1 Genes 



9-mers 



^ - A 

T 3> 



3' adaptor 
AGCT 



.A. 
-T- 



_A. 
-T- 



(Sac I) 



ligase 



-A. 
-T 



.A. 
-T- 



-T 



• AGCT, 



\ 




(Sac I) 



size-select genes 



GATC- 



clone into vector 



/AGCT 



Synthetic Gene 



CTAG TCGA 



Plasmid Vector 
cut with BamH I 
and Sac I 
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Figure 6 

Sequence Analysts of a 
Synthetic Gene 



v w KE A E K ■ A K 
amino acids: Y K k t 

nudeotides: T A C A A G A A a|g A A G C A G A A}V A G G C T A A A | 

amino acids: Y KKYKK KKA 

tides: TACAAGAA a|t ACAAGAAA p A G A A G G C A j 



nucleotides: 



amino acids: 



K K 



nucleotides: GAAGCAGA A'|a.A G A A G G C Ap AAGCAGAAj 

amino acids: KKAEAEEA 

nudeotidas: A A G A A G G C a|g A A G C A G A a}s A A G C A G A A | 

amino acids: K KAKAKEA E 

nucleotides: A A G A A G G C a|a AG G CT A A a|b A A G C A G A A | 

amino acids: K KAKK.AKAK 

nucleotides: A A G A A G G C a|a A G A A G G C a|a AGGCTAAA| 



The lines in the nucleotide sequence mark the Junctions 
between 9-mer duplexes. 



Amino Acids 
As alanine 
Kslysine 
E=gluiamic acid 
Y=tyrosinc 
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Figure 8 

Variations of Random-Gene Synthesis 
using Small Duplexes 



One Nucleotide Extension: 



o more GCA GAA 
3-mers TCG TCT 



GCAAAA GAAGCA 
6-mers TCGTTT TCTTCG 



lomore GCAGAAGCAAAA 
12-mers TCGTCTTCGTTT 



\ 



Three Nucleotide Extensions. 12-mer Duplexes: 



duplex XXXXXXXXXGCA 

CGTXXXXXXXXX 



^ligate 



aene xxxxxxxxxgcaxxxxxxxxxgcaxxxxxxxxxgca 

9 CGTXXXXXXXXXCGTXXXXXXXXXCGTXXXXXXXXX 



polypeptide Ala aa aa aa Ala aa aa aa Ala aa aa aaa Ala 



xxx= any codon 

aa= amino acid specified 

by codon xxx 
Ala« alanine 
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Figure 9 

Synthesis of Single-Stranded DNA Encoding 
Random Sequence Amino Acid Polymers 



3 Nucleotide Segments for COP I 

Ala 



y GCA 3 ' 



ratio: 



Glu 
5 GAA 3 ' 



Lys 
5 AAA ; 
5 



3' 



A< 
1 



5 TAC 3 ' 



Polymerize 

A. Solution Chemistry 
or B. DNA Synthesizer 



Single-Stranded DNA 



5' 



Double-Stranded DNA 



5' 
3' 



-3' 



Enzymatic Synthesis 
of the Second Strand 



-3' 
-5' 



Clone into 
Expression Vector 
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DMTr 
I 

O 
I 

CH 



Figure 10 

Phosphoramidite Trinucleotide for Random 
Gene Synthesis Using a DNA Synthesizer 



(W 1 

\ / Phosphoramidite 
. 1 Trinucleotide 

? 

0=Ij>— o 

? 

CH 9 I? 2 



? „ 

o=p— o 

? 

CH, B 3 



\ 



CN -CH 2 -CH 20 N ~~CH (CH a) 2 
CH (CH 3) 2 



Phosphoramidite 
Mononucleotide 



DMTr 




CH30 r4— CH (CHa)2 
CH (CH3) 2 
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FIGURE 11 
rCOP-1-77 

A , K K E- A" E ^ K K V K K , K K E : A E E 

KKAKEAEKAKKAKYKK 

« * ^KAAKAAAAAA^K 
YKKEAEAAKAAxv 

K E A E A A A 



IAEKAKYKKKAKE 
G CAGAAT ACAAG AAAAAG G CT AAAG CTG CTG CAG AAG AAT ACAAG AAAG AAG C AG AA 

AEYKKKAKA 



AAEAEYKKEAE 



:taaatacaagaaaaagg ct 

CTTCGTCTTATGTTCTTTATGTTCTTTTTCCGATTTTTCCG. 



G AAG CAG AATACAAG AAATACAAC AAAAAGG CT AAAAAGGCTAAATACAAG aaaaavj 
+ — "TlllI^ZZ^.mm^rr-rrrC-ATTTATGTTCTTTTTCCC-A 



A-K KAKYKKKA 

CAGAATAC 



AAAGAAGCAGAAAAGGCTAAAGCTGCTGCAGAAGCAGAAAAGG^ 
TTTCTTCGTCTTCTCCGATTTCGAC^A 



x E A E K A K E.A E * 



KEAEKAKAA 
AAGAAATACAAGAAAGAAGCAGAAAAGGCTAAAGAAGCAGAATAA 

Sc^TGTicTT^ScG^^CCGATTTCTTCGTCTTAM 
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FIGURE 12 
rCOP-1-19 

GC ^GGCTGC,G^ 

^ g ;ttccgacgtctctttcgtttccg a cgtttcttccc.. „ _ a k - 

A E K A jn * % 

~ V — +~ I" ™^™ C CGACGTATGTTCmATGTTCTn 



^cSSSSSSSacgtctctttcgittc^cgtatg 

_ ««. * v a. A E K A K 
GCGAAAGCAAAGGCTGCATAA 
• CGCTTTCGTTTCCGACGTATT 




1 



